Sparse Interpretible Audio Model

Table of Contents

Model Architecture

This small model attempts to decompose audio featuring acoustic instruments into the following components:

While global context and local event data are encoded as real-valued vectors and not discrete values, the representation learned still lends itself to a sparse, interpretible, and hopefully easy-to-manipulate encoding.

Each sound sample below includes the following elements:

  1. The original recording
  2. The model's reconstruction
  3. New audio using the original timing and context vector, but random event vectors
  4. New audio using the original event and context vectors, but with random timings
  5. New audio using the original timing and event vectors, but with a random global context vector

Cite this Work

                    
@misc{vinyard2023audio,
    author = {Vinyard, John},
    title = {Sparse Interpetable Audio},
    url = {https://JohnVinyard.github.io/machine-learning/2023/11/15/sparse-physical-model.html},
    year = 2024
}
                    
                

Sound Samples

Original

Recon

With Random Event Vectors

With Random Timings

With Random Global Context Vector

Global Context Vector for Original

Timeline

Original

Recon

With Random Event Vectors

With Random Timings

With Random Global Context Vector

Global Context Vector for Original

Timeline

Original

Recon

With Random Event Vectors

With Random Timings

With Random Global Context Vector

Global Context Vector for Original

Timeline

Original

Recon

With Random Event Vectors

With Random Timings

With Random Global Context Vector

Global Context Vector for Original

Timeline

Original

Recon

With Random Event Vectors

With Random Timings

With Random Global Context Vector

Global Context Vector for Original

Timeline